Add pack-index object existence checker strategy for prefetch#2003
Open
tyrielv wants to merge 1 commit into
Open
Add pack-index object existence checker strategy for prefetch#2003tyrielv wants to merge 1 commit into
tyrielv wants to merge 1 commit into
Conversation
c090d78 to
743b0fc
Compare
Introduce IObjectExistenceChecker strategy pattern to decouple blob prefetch from libgit2's git_revparse_single, which is extremely slow for missing objects (~2.8ms/op with 14 packs in a large GVFS cache). New PackIndexObjectExistenceChecker reads MIDX and supplemental .idx files directly in managed code via memory-mapped IO (~5us/op), with loose-object File.Exists fallback. Gated on gvfs.prefetch-use-idx git config (default: false). Components: - IObjectExistenceChecker: strategy interface - RevParseObjectExistenceChecker: wraps existing LibGit2Repo.ObjectExists - PackIndexObjectExistenceChecker: MIDX + pack idx + loose fallback - MidxReader: memory-mapped MIDX v1 parser with binary search - PackIndexReader: memory-mapped pack index v2 parser with binary search - FindBlobsStage: accepts optional checker factory (backward compatible) - BlobPrefetcher: reads config, creates appropriate checker factory Searches both LocalObjectsRoot and GitObjectsRoot (shared cache), detects supplemental packs not yet in MIDX via PNAM chunk diffing, and safely falls back to revparse on initialization errors. Unit tests cover: MIDX/idx hit and miss, all 256 fanout buckets, supplemental pack detection, loose objects, empty/missing pack dirs, multiple object roots, corrupt file handling, and deduplication. Assisted-by: Claude Opus 4.6 Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
743b0fc to
13d4935
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a strategy pattern (IObjectExistenceChecker) to decouple blob prefetch from libgit2's
git_revparse_single, which is extremely slow for missing objects (~2.8ms/op with 14 packs in a large GVFS cache).New
PackIndexObjectExistenceCheckerreads MIDX and supplemental.idxfiles directly in managed code via memory-mapped IO (~5μs/op), with loose-objectFile.Existsfallback. 460x faster for cache-miss lookups.Gated on config
Default:
false(existing revparse behavior unchanged).Components
IObjectExistenceCheckerLibGit2ObjectExistenceCheckerLibGit2Repo.ObjectExistsPackIndexObjectExistenceCheckerMidxReaderPackIndexReaderDesign decisions
LocalObjectsRootandGitObjectsRoot(shared cache), de-duplicatedFindBlobsStageworkers, wrapped inNonDisposingCheckerWrapperBenchmark (59.7M objects, 14 packs)